Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #402 +/- ##
==========================================
- Coverage 63.15% 54.20% -8.96%
==========================================
Files 32 36 +4
Lines 1900 5267 +3367
Branches 204 656 +452
==========================================
+ Hits 1200 2855 +1655
- Misses 600 2114 +1514
- Partials 100 298 +198 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // NOTE: This introduces potential double-free risk with gc_repl_reqs() background thread. | ||
| // See https://github.com/eBay/HomeObject/issues/401. | ||
| auto hs_pg = m_hs_home_object->get_hs_pg(pg_id); | ||
| hs_pg->repl_dev_->clear_chunk_req(move_to_chunk); |
There was a problem hiding this comment.
do we need similar for move_from_chunk
There was a problem hiding this comment.
pls move this change into purge_reserved_chunk and before vchunk->reset()
There was a problem hiding this comment.
Moved. Also changed chunk to vchunk->get_chunk_id() (IIUC this also returns the pchunk id, please correct me if I'm wrong).
| // NOTE: This introduces potential double-free risk with gc_repl_reqs() background thread. | ||
| // See https://github.com/eBay/HomeObject/issues/401. | ||
| auto hs_pg = m_hs_home_object->get_hs_pg(pg_id); | ||
| hs_pg->repl_dev_->clear_chunk_req(move_to_chunk); |
There was a problem hiding this comment.
pls move this change into purge_reserved_chunk and before vchunk->reset()
Good question. Since the chunk is already a move_to_chunk, meaning it is in the m_reserved_chunk_queue, will there still be new rreq on this chunk? If I understand correctly, there are two types of write IO on the chunk: put and delete. For a put, since there is no open shard on the chunk, it will first create one — the create operation will be blocked at select_specific_chunk due to chunk_state=GC. For delete, it does not occupy space, so there will be no blk distribution. If that's the case, currently there is no risk. However, it would be better if you can also review the blob path logic while sorting out the shard process to see if there are other potential issues with GC concurrency. |
277fbe8 to
e7df539
Compare
When GC resets move_to_chunk via purge_reserved_chunk(), stale repl_reqs may still exist and be cleaned up by background gc_repl_reqs(). This causes two race conditions: 1. Stale rreq frees blk on NEW allocator after reset (wrong allocator) 2. Stale rreq frees blk on OLD allocator during reset, accessing destroyed superblock and causing crash
When GC resets move_to_chunk via purge_reserved_chunk(), stale repl_reqs may still exist and be cleaned up by background gc_repl_reqs(). This causes two race conditions:
Issue link: #401